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Description 

[0001] The present invention relates to a method of 
automatic text processing. In particular, the present in- 
vention relates to an automatic method of generating 
thematic summaries of documents. 
[0002] Document summaries and abstracts serve a 
valuable function by reducing the time required to review 
documents. Summaries and abstracts can be generat- 
ed after document creation either manually or automat- 
ically. Manual summaries and abstracts can be of high 
quality but may be expensive because of the human la- 
bor required. Alternatively, summaries and abstracts 
can be generated automatically. Automatic summaries 
and abstracts can be cheaper to produce, but obtaining 
high quality consistently is difficult 
[0003] Systems for generating automatic summaries 
rely upon one of two computational techniques, natural 
language processing or quantitative content analysis. 
Natural language processing is computationally inten- 
sive. Additionally, producing semantically correct sum- 
maries and abstracts is difficult using natural language 
processing when document content is not limited. 
[0004] Quantitative content analysis relies upon sta- 
tistical properties of text to produce summaries. Gerald 
Salton discusses the use of quantitative content analy- 
sis to summarize documents in "Automatic Text 
Processing" (1989). The Salton summarizer first iso- 
lates text words within a corpus of documents. Next, the 
Salton summarizer flags as title words words used in 
titles, figures, captions, and footnotes. Afterward, the 
frequency of occurrence of the remaining text words 
within the document corpus is determined. The frequen- 
cy of occurrence and the location of text words are then 
used to generate word weights. The Salton summarizer 
uses the word weights to score each sentence of each 
document in the document corpus. These sentence 
scores are used in turn to produce a summary of a pre- 
determined length for each document in the document 
corpus. Summaries produced by the Salton summarizer 
may not accurately reflect the themes of individual doc- 
uments because word weights are determined based 
upon their occurrence across the document corpus, 
rather than within each individual document. 
[0005] Edmundson, H. P.: "New Methods In Automat- 
ic Extracting" Journal of the Association For Computing 
Machinery, vol. 16, April 1969, pages 264-285, 
XP002078269 discloses methods of automatically ex- 
tracting documents for screening purposes, i.e. the 
computer selection of sentences having the greatest po- 
tential for conveying to the reader the substance of the 
document. The described methods treat in addition to 
the presence of high-frequency content words (key- 
words) three components: pragmatic words (cue 
words); title and heading words; and structural indica- 
tors (sentence location). 

[0006] US-A-5 384 703 describes automatically form- 
ing asummary by selecting regions of adocument. Each 



selected region includes at least two members of a seed 
list. The seed list is formed from a predetermined 
number of the most frequently occurring complex ex- 
pressions in the document that are not on a stop list. If 

5 the summary is too long, the region selection process is 
performed on the summary to produce a shorter sum- 
mary. This region-selection process is repeated until a 
summary is produced having a desired length. Each 
time the region selection process is repeated, the seed 

10 list members are added to the stop list and the complex- 
ity level used to identify frequently occurring expres- 
sions is reduced. 

[0007] "Method for Automatic Extraction Of Relevant 
Sentences From Texts", IBM Technical Disclosure Bul- 

15 letin, vol. 33, no 6A, November 1990, page 338-339, 
XP00201 5802 describes a method for automatically ex- 
tracting from a text in any language the most significant 
and explicative sentences. All the words of the text are 
analyzed and listed according to an order of importance. 

20 Only useful words are considered excluding "stop 
words" such as articles, pronouns, prepositions, etc. 
The importance of selected words is evaluated accord- 
ing to their frequency and position within the text. The 
words with high-frequency and located in the title, intro- 

25 duction or conclusion are considered of increased im- 
portance. 

[0008] Black W. J. et al.: "A Practical Evaluation Of 
Two Rule-Based Automatic Abstracting Techniques", 
Expert Systems For Information Management, vol. 1 , no 

30 3, 1 988, pages 1 59-1 77, XP00201 5761 describes auto- 
matic abstracting techniques based on statistical ana- 
lyzes and superficial syntactic pattern matching of texts, 
which lend themselves to implementation using a rule- 
based approach. 

35 [0009] Luhn, H. P.: "The Automatic Creation Of Liter- 
ature Abstracts", IBM Journal, April 1958, pages 
159-165, XP002078270 describes a method for auto- 
matically creating an abstract from an article presented 
in machine-readable form to a data-processing ma- 

40 chine, whereby statistical information is derived from 
word frequency and distribution. This statistical informa- 
tion is used by the machine to compute a relative meas- 
ure of significance, first for individual words and then for 
sentences. Sentences scoring highest in significance 

45 are extracted and printed out to become the "auto-ab- 
stract". 

[0010] An object of the present invention is to auto- 
matically generate improved document summaries that 
accurately reflect the themes of each document. 

50 [0011] According to the present invention, there is 
provided a processor implemented method of generat- 
ing a thematic summary of a document presented in ma- 
chine-readable form to the processor, the document in- 
cluding a first multiplicity of sentences and a second 

55 multiplicity of terms, the processor implementing the 
method by executing instructions stored in electronic 
form in a memory device coupled to the processor, the 
processor implemented method comprising the steps 
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of: a) determining a value of a first number of thematic 
terms based upon a value of a second number repre- 
senting a length of the thematic summary, the first 
number being less than the second number; b) selecting 
the first number of thematic terms from the second mul- 
tiplicity of terms; c) scoring each sentence of the first 
multiplicity of sentences based upon occurrence of the- 
matic terms in each sentence; and d) selecting the sec- 
ond number of thematic sentences from the first multi- 
plicity of sentences based upon the score of each sen- 
tence. 

[001 2] The method of the invention automatically pro- 
duces readable and semantically correct document 
summaries. The method requires a user, at most, to 
specify the length of the desired summary. Document 
summaries may be automatically generated without us- 
ing an iterative approach. 

[0013] A technique for automatically generating the- 
matic summaries of machine readable documents will 
be described. The technique begins with identification 
of thematic terms within the document. Next, each sen- 
tence of the document is scored based upon the number 
of thematic terms contained within the sentence. After- 
ward, the highest scoring sentences are selected as the- 
matic sentences. The present invention will now be de- 
scribed, by way of example with reference to the accom- 
panying drawings, in which: 

Figure 1 illustrates a computer system for automat- 
ically generating thematic summaries of docu- 
ments, and 

Figure 2 is a flow diagram of a method of generating 
a thematic summary of a document using the com- 
puter system of Figure 1 . Figure 1 illustrates in block 
diagram form computer system 10 in which the 
present method is implemented. The present meth- 
od alters the operation of computer system 10, al- 
lowing it to generate a thematic summary of any 
document presented in machine readable form. 
Briefly described, computer system 1 0 generates a 
thematic summary by identifying thematic terms 
within the document and then scoring each sen- 
tence of the document based upon the number of 
thematic terms contained within the sentence. Af- 
terward, computer system 10 selects the highest 
scoring sentences as thematic sentences and 
presents those sentences to a user of computer 
system 10. 

[0014] Prior to a more detailed discussion of the 
present method, consider computer system 1 0. Compu- 
ter system 1 0 includes monitor 1 2 for visually displaying 
information to a computer user. Computer system 1 0 al- 
so outputs information to the computer user via printer 
13. Computer system 10 provides the computer user 
multiple avenues to input data. Keyboard 14 allows the 
computer user to input data to computer system 10 by 
typing. By moving mouse 16 the computer user is able 



to move a pointer displayed on monitor 1 2. The compu- 
ter user may also input information to computer system 
1 0 by writing on electronic tablet 1 8 with a stylus or pen 
20. Alternatively, the computer user can input data 

5 stored on a magnetic medium, such as a floppy disk, by 
inserting the disk into floppy disk drive 22. Optical char- 
acter recognition unit (OCR unit) 24 permits the compu- 
ter user to input hardcopy documents 26 into the com- 
puter system, which OCR unit 24 then converts into a 

10 coded electronic representation, typically American Na- 
tional Standard Code for Information Interchange (AS- 
CII). 

[0015] Processor 1 1 controls and coordinates the op- 
erations of computer system 10 to execute the corn- 
's mands of the computer user. Processor 11 determines 
and takes the appropriate action in response to each 
user command by executing instructions stored elec- 
tronically in memory, either memory 28 or on a floppy 
disk within diskdrive 22. Typically, operating instructions 

20 for processor 11 are stored in solid state memory 28, 
allowing frequent and rapid access to the instructions. 
Semiconductor memory devices that can be used in- 
clude read only memories (ROM), random access mem- 
ories (RAM), dynamic random access memories 

25 (DRAM), programmable read only memories (PROM), 
erasable programmable read only memories (EPROM), 
and electrically erasable programmable read only mem- 
ories (EEPROM), such as flash memories. 
[0016] Figure 2 illustrates in flow diagram form the in- 

30 structions 40 executed by processor 11 to generate a 
thematic summary of a machine readable document. In- 
structions 40 may be stored in solid state memory 28 or 
on a floppy disk placed within floppy disk drive 22. In- 
structions 40 may be realized in any computer lan- 

35 guage, including LISP and C + +. 

[0017] Initiating execution of instructions 40 requires 
selection and input of a document in electronic form. If 
desired, prior to initiating execution of instructions 40 the 
computer user may also change the length, denoted "S", 

40 of the thematic summary from the default length. The 
default length of the thematic summary may be set to 
any arbitrary number of sentences. In an embodiment 
intended for document browsing, the default length of 
the thematic summary is set to five sentences. 

45 [0018] Processor 11 responds to selection of a docu- 
ment to be summarized by branching to step 42. During 
step 42 processor 11 tokenizes the selected document 
into words and sentences. That is to say, processor 11 
analyzes the machine readable representation of the 

50 selected document and identifies sentence boundaries 
and the words within each sentence. 
[0019] Tokenization of natural language text is well 
known and therefore will not be described in detail here- 
in. Additionally, during tokenization, processor 11 as- 

55 signs a sentence I. D. to each sentence of the document. 
In one embodiment, each sentence is identified by a 
number representing its location with respect to the start 
of document. Other methods of identifying the sentenc- 
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es may be used without affecting the present method. 
After tokenizing the selected document, processor 11 
branches from step 42 to step 44. 
[0020] Processor 1 1 examines each word token of the 
document during step 44 and compares the word to the 
terms already included in a term list. If the word token 
is not yet included on the list, then processor 11 adds 
the word to the term list and notes the sentence I.D. of 
the sentence in which the word occurs. On the other 
hand, if the word is already on the term list, processor 
1 1 simply adds the sentence I.D. for that word token to 
the entry, or list, for that term. In other words, during step 
44 processor 11 generates a data structure associating 
words of the document with the location of every occur- 
rence of that term. Thus, for example, a term list entry 
of "apostasy, 7, 9, 1 2" indicates that the term "apostasy" 
occurs in sentences 7, 9, and 12 of the document. 
[0021 ] Preferably, while generating the term list, proc- 
essor 11 filters out stop words. As used herein, "stop 
words" are words that do not convey thematic meaning 
and occur very frequently in natural language text. Most 
pronouns, prepositions, determiners, and "to be" verbs 
are classified as stop words. Thus, for example, words 
such as "and, a, the, on, by, about, he, she" are stop 
words. Stop words within the document are identified by 
comparing the word tokens for the document to a list of 
stop words. Eliminating stop words from the term list is 
not necessary, but doing so reduces the total processing 
time required to generate a thematic summary of a doc- 
ument. 

[0022] Processor 1 1 branches to step 46 from step 44 
after completing the term list. During step 46 processor 
11 analyzes the term list to determine the number of 
times each term occurs in the document. This is done 
simply by counting the number of sentence I.D.s asso- 
ciated with the term. That done, processor 1 1 branches 
to step 50. 

[0023] After initiation of execution and prior to execu- 
tion of step 50, during step 48, processor 1 1 determines 
the number of thematic terms to be used in selecting 
thematic sentences. That number, denoted "K", is de- 
termined based upon the length of the thematic summa- 
ry; i.e., based upon S. In general, /^should be less than 
Sand greater than 1. Requiring Kbe less than S insures 
some commonality of theme between selected thematic 
sentences. Preferably, K'\s determined according to the 
equation: 



. f Sxci Sx ci> 3 
1^ 3 otherwise; 

where: 

c 1 is a constant whose value is less than 1 ; 

S is the number of sentences in the thematic sum- 



mary; and 

K is the number of thematic terms. 

In one embodiment, the value of c 1 is set equal to 0.7. 
5 [0024] Armed with a value for Kand the term counts 
generated during step 46, processor 1 1 begins the proc- 
ess of selecting /^thematic terms. During step 50, proc- 
essor 1 1 sorts the terms of the term list according to their 
counts; i.e., the total number of occurrences of each 
term in the document. Ties between two terms having 
the same count are broken in favor of the term including 
the greatest number of characters. Having generated a 
sorted term list and stored the list in memory, processor 
11 branches from step 50 to step 52. During step 52 
processor selects from the sorted term list the K terms 
with the highest counts. That done, processor 11 ad- 
vances to step 54. 

[0025] During step 54 processor 1 1 computes the to- 
tal number of occurrences of the K thematic terms in the 
document. That number, denoted "A/", is calculated by 
summing the counts of the /^thematic terms. Processor 
11 branches to step 56 from step 54. 
[0026] Having selected the thematic terms and deter- 
mined their counts, processor 11 is ready to begin eval- 
uating the thematic content of the sentences of the doc- 
ument. During steps 56, 58, 60, and 62, processor 11 
considers only those sentences that include at least one 
of the K thematic terms. Processor 1 1 does so by exam- 
ining the K highest scoring terms of the sorted term list 
After selecting a term, denoted t s , during step 56, proc- 
essor 11 examines each sentence I.D. associated with 
t s during step 58. For each sentence I.D. associated with 
t s processor 11 increments that sentence's score. Pref- 
erably, the score for each sentence is incremented by 
s, where s is expressed by the equation: 

s = count t [c 2 + freq t ]; 

s s 

where: 

count is the number of occurences of t in the sen- 
is is 

tence 

c 2 is a constant having a non-zero, positive value; 
and 

freq ts is the frequency of the selected term t s . 
freq Xs is given by the expression: 
freq Xs = court t t JN\ 

where: 

N represents the total number of occurrences of 
thematic terms within the document. 

Preferably, c 2 is set to a value of one. 
[0027] Sentence scores can be tracked by generating 
a sentence score list during step 58. Each time proces- 
sor 1 1 selects a sentence I.D. the sentence score list is 



15 



20 



25 



30 



35 



40 



45 



50 



4 



7 



EP 0 737 927 B1 



8 



examined to see if it includes that sentence I.D.. If not, 
the sentence I.D. is added to the sentence score list and 
its score is increased as appropriate. On the other hand, 
if the sentence score list already includes the particular 
sentence I.D., then the score already associated with 
the sentence is incremented in the manner discussed 
previously. 

[0028] After incrementing the scores of all sentences 
associated with the selected term, t s , processor 11 
branches from step 58 to step 60. During step 60 proc- 
essor 11 determines whether all the thematic terms 
have been evaluated. If not, processor 1 1 returns to step 
56 to select another thematic term as the selected term. 
Processor 11 branches through steps 56, 58, and 60 as 
described previously until all of the thematic terms have 
been examined. When that event occurs, processor 11 
branches to step 62 from step 60. 
[0029] During step 62 processor 1 1 selects as the the- 
matic summary the Ssentences with the highest scores. 
Processor 11 does this by sorting the sentence score 
list by score. Having selected the thematic sentences, 
processor 11 may present the thematic summary to the 
user via monitor 12 or printer 13, as well as storing the 
thematic summary in memory 22 or to floppy disk for 
later use. The sentences of the thematic summary are 
preferably presented in their order of occurrence within 
the document. While the sentences may be presented 
in paragraph form, presentation of each sentence indi- 
vidually is preferable because the sentences may not 
logically form a paragraph. Generation of the thematic 
summary complete, processor 11 branches to step 64 
from step 62. 

[0030] Thus, a method of automatically generating 
thematic summaries for documents has been de- 
scribed. The method relies upon quantitative content 
analysis to identify thematic words, which are used in 
turn to identify thematic sentences. Appendix A and Ap- 
pendix B include summaries generated using this meth- 
od to automatically generate thematic summaries. 

Appendix A: Summary of Shevardnadze's 
Resignation Speech 

[0031] I have drawn up the text of such a speech, and 
I gave it to the secretariat, and the deputies can acquaint 
themselves with it -- what has been done is the sphere 
of current policy by the country's leadership, by the 
President and by the ministry of Foreign Affairs, and how 
the current conditions are shaping up for the develop- 
ment of the country, for the implementation of the plans 
for our democratization and renewal of the country, for 
economic development and so on. 
[0032] Yesterday there were speeches by some com- 
rades -- they are our veterans - who raised the question 
of the need for a declaration to be adopted forbidding 
the President and the country's leadership from sending 
troops to the Persian Gulf. And these speeches yester- 
day, comrades, filled the cup of patience, to overflowing. 



[0033] On about 1 0 occasions, both in the country and 
abroad, I have had to speak and explain the attitude of 
the Soviet Union toward this conflict. 
[0034] In that case we would have had to strike 
5 through everything that has been done in recent years 
by all of us, by the whole country and by all of our people 
in the field of asserting the principles of the new political 
thinking. 

[0035] Second, I have explained repeatedly, and 
10 Mikhail Sergeyevich spoke of this in his speech at the 
Supreme Soviet, that the Soviet leadership does not 
have any plans - I do no know, maybe someone else 
has some plans, some group ~ but official bodies, the 
Ministry of Defense - charges are made that the Foreign 
15 Minister plans to land troops in the Persian Gulf, in the 
region. 

[0036] The third issue, I said there, and I confirm it 
and state it publicly, that if the interests of the Soviet 
people are encroached upon, if just one person suffers 
20 - wherever it may happen, in any country, not just in 
Iraq but in any other country - yes, the Soviet Govern- 
ment, the Soviet side will stand up for the interests of its 
citizens. 

[0037] I say that, all the same, this is not a random 
25 event. Excuse me, I am now going to recall the session 
of the supreme soviet. On comrade Lukyanov's initia- 
tive, literally just before the start of a meeting, a serious 
matter was included on the agenda about the treaties 
with the german democratic republic. 
30 [0038] I cannot reconcile myself with what is happen- 
ing in my country and to the trials which await our peo- 
ple. 

Appendix B: Summary of "Research that Reinvents 
35 the Corporation" by John Seely Brown 

[0039] As companies try to keep pace with rapid 
changes in technology and cope with increasingly un- 
stable business environments, the research department 
40 has to do more than simply innovate new products. 
[0040] Over the next decade, PARC researchers 
were responsible for some of the basic innovations of 
the personal-computer revolution-only to see other 
companies commercialize these innovations more 
45 quickly than Xerox. 

[0041] One popular answer to these questions is to 
shift the focus of the research department away from 
radical breakthroughs toward incremental innovation, 
away from basic research toward applied research. 
50 [0042] Our emphasis on pioneering research led us 
to redefine what we mean by technology, by innovation, 
and indeed by research itself. 

[0043] Such activities are essential for companies to 
exploit successfully the next great breakthrough in in- 
55 formation technology: "ubiquitous computing," or the in- 
corporation of information technology in a broad range 
of everyday objects. 

[0044] When corporate research begins to focus on a 
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company's practice as well as its products, another prin- 
ciple quickly becomes clear: innovation isn't the privi- 
leged activity of the research department. At PARC, we 
are studying this process of local innovation with em- 
ployees on the front lines of Xerox's business and de- 
veloping technologies to harvest its lessons for the com- 
pany as a whole. 

[0045] The result: important contributions to Xerox's 
core products but also a distinctive approach to innova- 
tions with implications far beyond our company. 



Claims 

1. A processor implemented method of generating a 
thematic summary of a document (26) presented in 
machine-readable form to the processor (11), the 
document (26) including a first multiplicity of sen- 
tences and a second multiplicity of terms, the proc- 
essor (11) implementing the method by executing 
instructions stored in electronic form in a memory 
device (28) coupled to the processor (11 ), the proc- 
essor implemented method comprising the steps of: 

a) determining (48) a value of a first number of 
thematic terms based upon a value of a second 
number representing a length of the thematic 
summary, the first number being less than the 
second number; 

b) selecting (52) the first number of thematic 
terms from the second multiplicity of terms; 

c) scoring (58) each sentence of the first multi- 
plicity of sentences based upon occurrence 
(54) of thematic terms in each sentence; and 

d) selecting (62) the second number of thematic 
sentences from the first multiplicity of sentenc- 
es based upon the score of each sentence. 

2. The processor implemented method of claim 1 
wherein step b) comprises: 

i) determining (46) a number of times each term 
of the second multiplicity of terms occurs in the 
document, and 

ii) selecting (52) the first number of thematic 
terms from the second multiplicity of terms 
based upon the number of times (50) each term 
occurs in the document. 

3. The processor implemented method of claim 1 or 
claim 2 further comprising the step of: 

e) presenting the thematic sentences to a user 
of the processor (11) in an order in which the 



thematic sentences occur in the document. 

4. The processor implemented method of any one of 
claims 1 to 3 wherein step c) comprises increment- 
5 ing (58) the score of each sentence for each the- 

matic term occurring in the sentence by an amount 
related to the frequency of occurrence of the the- 
matic term within the document. 

10 5. The processor implemented method of any one of 
claims 1 to 4 comprising, prior to step a), the step 
of receiving the value of the second number from 
an input device (14) coupled to the processor (11). 

15 6. The processor implemented method of any one of 
claims 1 to 5 wherein the first number is at least 
three. 



1. Prozessorimplementiertes Verfahren zum Erzeu- 
gen einer thematischen Zusammenfassung eines 
Dokuments (26), das dem Prozessor (11) in ma- 

25 schinenlesbarer Form prasentiert wird, wobei das 
Dokument (26) eine erste Vielzahl von Satzen und 
eine zweite Vielzahl von Termen umfasst und der 
Prozessor (1 1 ) das Verfahren durch Ausfuhren von 
Instruktionen implementiert, die in elektronischer 

30 Form in einer Speichereinrichtung (28), die mit dem 
Prozessor (11) gekoppelt ist, gespeichert sind, wo- 
bei das prozessorimplementierte Verfahren die fol- 
genden Schritte aufweist: 

35 a) Bestimmen (48) eines Wertes einer ersten 

Anzahl von thematischen Termen beruhend auf 
einem Wert einer zweiten Anzahl, die eine Lan- 
ge der thematischen Zusammenfassung repra- 
sentiert, wobei die erste Anzahl kleiner als die 
40 zweite Anzahl ist; 

b) Auswahlen (52) der ersten Anzahl von the- 
matischen Termen von der zweiten Vielzahl 
von Termen; 

45 

c) Bewerten (58) jedes Satzes der ersten Viel- 
zahl von Satzen beruhend auf eines Auftretens 
(54) von thematischen Termen in jedem Satz; 
und 

50 

d) Auswahlen (62) der zweiten Anzahl von the- 
matischen Satzen von der ersten Vielzahl von 
Satzen beruhend auf der Bewertung jedes Sat- 
zes. 

55 

2. Prozessorimplementiertes Verfahren nach An- 
spruch 1 , wobei Schritt b) aufweist: 
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i) Bestimmen (46) einer Haufigkeit, mit der je- 
der Term der zweiten Vielzahl von Termen in 
dem Dokument auftritt, und 

ii) Auswahlen (52) der ersten Anzahl von the- 
matischen Termen von der zweiten Vielzahl 
von Termen beruhend auf der Haufigkeit (50), 
mit der jeder Term in dem Dokument auftritt. 

3. Prozessorimplementiertes Verfahren nach An- 
spruch 1 Oder 2, des Weiteren aufweisend den 
Schritt: 

e) Prasentieren der thematischen Satze zu ei- 
nem Benutzer des Prozessors (1 1 ) in einer Rei- 
henfolge, in der die thematischen Satze in dem 
Dokument auftreten. 

4. Prozessorimplementiertes Verfahren nach einem 
der Anspruche 1 bis 3, wobei Schritt c) ein Inkre- 
mentieren (58) der Bewertung von jedem Satz um- 
fasst, fur jeden thematischen Term, der in dem Satz 
auftritt, urn einen Betrag, der sich auf die Haufigkeit 
des Auftretens des thematischen Terms innerhalb 
des Dokuments bezieht. 

5. Prozessorimplementiertes Verfahren nach einem 
der Anspruche 1 bis 4, umfassend, vor Schritt a), 
den Schritt des Empfangens des Werts der zweiten 
Anzahl von einer Eingabeeinrichtung (14), die mit 
dem Prozessor (11 ) gekoppelt ist. 

6. Prozessorimplementiertes Verfahren nach einem 
der Anspruche 1 bis 5, wobei die erste Anzahl min- 
destens 3 ist. 



Revendications 

1 . Procede mis en oeuvre par un processeur de crea- 
tion d'un resume thematique d'un document (26) 
presente sous forme pouvant etre lu par une ma- 
chine au processeur (1 1 ), le document (26) compre- 
nant une premiere variete de phrases et une secon- 
de variete de termes, le processeur (11 ) mettant en 
oeuvre le procede en executant les instructions 
stockees sous forme electronique dans un dispositif 
a memoire (28) accouple au processeur (1 1 ), le pro- 
cede mis en oeuvre par un processeur comprenant 
les etapes : 

a) de determination (48) d'une valeur d'un pre- 
mier nombre de termes thematiques sur la ba- 
se d'une valeur d'un second nombre represen- 
tant une longueur du resume thematique, le 
premier nombre etant inferieur au second 
nombre ; 



b) de selection (52) du premier nombre de ter- 
mes thematiques a partir de la seconde variete 
de termes ; 

5 c) devaluation (58) de chaque phrase de la pre- 

miere variete de phrases sur la base de I'appa- 
rition (54) de termes thematiques dans chaque 
phrase ; et 

10 d) de selection (62) du second nombre de phra- 

ses thematiques a partir de la premiere variete 
de phrases sur la base de revaluation de cha- 
que phrase. 

15 2. Le procede mis en oeuvre par un processeur selon 
la revendication 1, dans lequel I'etape b) 
comprend : 

i) la determination (46) d'un nombre de fois que 
20 chaque terme de la seconde variete de termes 

apparait dans le document, et 

ii) la selection (52) du premier nombre de zer- 
mes thematiques a partir de la seconde variete 

25 de termes sur la base du nombre de fois (50) 

que chaque terme apparait dans le document. 

3. Le procede mis en oeuvre par un processeur selon 
la revendication 1 ou la revendication 2 comprenant 

30 en outre I'etape de : 

e) presentation des phrases thematiques a un 
utilisateur du processeur (11) dans un ordre 
dans lequel les phrases thematiques apparais- 
35 sent dans le document. 

4. Le procede mis en oeuvre par un processeur selon 
I'une quelconque des revendications 1 a 3, dans le- 
quel I'etape c) comprend I'incrementation (58) de 
revaluation de chaque phrase pour chaque terme 
thematique apparaissant dans la phrase d'une 
quantite relative a la frequence d'apparition du ter- 
me thematique dans le document. 

Le procede mis en oeuvre par un processeur selon 
I'une quelconque des revendications 1 a 4, compre- 
nant, avant I'etape a), I'etape de reception de la va- 
leur du second nombre a partir d'un dispositif d'en- 
tree (14) accouple au processeur (11). 

Le procede mis en oeuvre par un processeur selon 
I'une quelconque des revendications 1 a 5, dans le- 
quel le premier nombre est au moins trois. 
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